Technical Report: Ratio Threshold Queries over Distributed Data Sources

نویسندگان

  • Rajeev Gupta
  • Krithi Ramamritham
  • Mukesh Mohania
چکیده

Continuous aggregation queries over dynamic data are used for real time decision making and timely business intelligence. In this paper we consider queries where a client wants to be notified if the ratio of two aggregates over distributed data crosses a specified threshold. Consider these scenarios: a mechanism designed to defend against distributed denial of service attacks may be triggered when the fraction of packets arriving to a subnet is more than 5% of the total packets; or a distributed store chain withdraws its discount on luxury goods when sales of luxury goods constitute more than 20% of the overall sales. The challenge in executing such ratio threshold queries (RTQs) lies in incurring the minimal amount of communication necessary for propagation of updates from data sources to the aggregator node where the client query is executed. We address this challenge by proposing schemes for converting the client ratio threshold condition into conditions on individual distributed data sources. Whenever the condition associated with a source is violated, the source pushes its data values to the aggregator, which in turn pulls data values from other sources to determine whether the client threshold condition is indeed violated. We present algorithms to minimize the number of source condition violations (i.e., the number of pushes) while ensuring that no violation of the client threshold condition is missed. Further, in case of a source condition violation, we propose efficient selective pulling algorithms for intelligently choosing additional sources whose data should be pulled by the aggregator. Using performance evaluation on synthetic and real traces of data updates we show that our algorithms result in up to an order of magnitude less number of messages compared to existing approaches in the literature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Threshold Querying of General Functions by a Difference of Monotonic Representation

The goal of a threshold query is to detect all objects whose scoreexceeds a given threshold. This type of query is used in many set-tings, such as data mining, event triggering, and top-k selection.Often, threshold queries are performed over distributed data. Givendatabase relations that are distributed over many nodes, an object’sscore is computed by aggregating the value o...

متن کامل

Verteilung globaler Anfragen auf heterogene Stromverarbeitungssysteme

Deployment of Global Queries in Distributed and Heterogeneous StreamProcessing Systems Distributed in-network stream processing is more efficient than sending all data to a central processing unit. In the past few years Stream-Processing Systems (SPSs) have established themselves as an interesting alternative to database systems for continuous query processing. There are many scenarios having w...

متن کامل

Search for the Best but Expect the Worst - Distributed Top-k Queries over Decreasing Aggregated Scores

We consider distributed top-k queries in wide-area networks where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers. In contrast to existing work, we exclusively consider distributed top-k queries over decreasing aggregated values. State-of-the-art distributed top-k algorithms usually depend on threshold propagation to reduce expen...

متن کامل

Using First-Order Logic to Query Heterogeneous Internet Data Sources

This paper describes an approach to formulate queries in the language of first order logic over data from disparate sources distributed over a network. The data sources are treated as if they were all in a common database. The data sources may incorporate different stored or computed methods of providing data– web services and REST APIs, XML/JSON repositories, web pages, full featured databases...

متن کامل

DeXIN: An Extensible Framework for Distributed XQuery over Heterogeneous Data Sources

In the Web environment, rich, diverse sources of heterogeneous and distributed data are ubiquitous. In fact, even the information characterizing a single entity like, for example, the information related to a Web service is normally scattered over various data sources using various languages such as XML, RDF, and OWL. Hence, there is a strong need for Web applications to handle queries over het...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013